Conversation
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Merging this PR will degrade performance by 15.51%
Performance Changes
Comparing Footnotes
|
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
| */ | ||
| void initNativeResources() { | ||
| file = Files.open(partition.getPath()); | ||
| file = Files.open(partition.getPath(), partition.getFormatOptions()); |
There was a problem hiding this comment.
this is the thing we needed. everything else is just plumbing
| import java.util.Map; | ||
| import java.util.Optional; | ||
|
|
||
| public final class VortexS3Properties { |
There was a problem hiding this comment.
i ripped this straight out of our iceberg fork
|
@claude please review. make no mistakes. be extremely javapilled |
This comment was marked as resolved.
This comment was marked as resolved.
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
|
@claude please re-review. I should have addressed all of the RED comments, and most of the green ones. I don't think the YELLOW ones are relevant. as always, stay javapilled. |
This comment was marked as resolved.
This comment was marked as resolved.
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
|
@claude ok i addressed the props in cache key thing. happy now? |
|
Claude finished @a10y's task in 30s —— View job Re-Review (Cache Key Fix)
✅ Yes, happy now.
fn url_cache_key(url: &Url, properties: &HashMap<String, String>) -> String {
let mut sorted_props: Vec<_> = properties.iter().collect();
sorted_props.sort_by_key(|(k, _)| *k);
let props_str: String = sorted_props
.iter()
.map(|(k, v)| format!("{k}={v}"))
.collect::<Vec<_>>()
.join(",");
format!("{}://{};{}", url.scheme(), &url[...], props_str)
}The sort guarantees determinism regardless of insertion order, so two calls with the same credentials always resolve to the same cache entry, and two calls with different credentials (or no credentials vs. credentials) get distinct entries. The fix is correct. One non-blocking observation: credentials end up as plaintext in an in-memory HashMap key. That's fine — it's never serialized or logged — but worth being aware of if this cache ever grows legs (e.g., persisted to disk, exposed via metrics). Not asking for a change, just flagging it. All previously raised issues are now resolved. LGTM. 🚀 |
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Summary
In Spark, pass format properties into reader.
This mirrors the existing implementation of ParquetDataSourceV2 which rips the relevant S3/Azure/etc. properties off of the active SparkSession's hadoopConf. This is also similar to what we do in Vortex iceberg already.
Testing
Added a new test suite for reading/writing to S3 using the Adobe S3Mock testcontainer.